Local Relevance Weighted Maximum Margin Criterion for Text Classification

نویسندگان

  • Quanquan Gu
  • Jie Zhou
چکیده

Text classification is a very important task in information retrieval and data mining. In vector space model (VSM), document is represented as a high dimensional vector, and a feature extraction phase is usually needed to reduce the dimensionality of the document. In this paper, we propose a feature extraction method, named Local Relevance Weighted Maximum Margin Criterion (LRWMMC). It aims to learn a subspace in which the documents in the same class are as near as possible while the documents in the different classes are as far as possible in the local region of each document. Furthermore, the relevance is taken into account as a weight to determine the extent to which the documents will be projected. LRWMMC is able to find the low dimensional manifold embedded in the high dimensional ambient space. In addition, We generalize LRWMMC to Reproducing Kernel Hilbert Space (RKHS), which can resolve the nonlinearity of the input space. We also generalize LRWMMC to tensor space which is suitable for a new document representation, named tensor space model (TSM). On the other hand, in order to utilize the large amount of unlabeled documents, we also present a Semi-Supervised LRWMMC, which aims to find a projection inferred from the labeled samples, as well as the unlabeled samples. Finally, we present a fast algorithm based on QR-decomposition to make the methods proposed in this paper apply for large scale data set. Encouraging experimental results on benchmark text classification data sets indicate that the proposed methods outperform many existing feature extraction methods for text classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presentation of quasi-linear piecewise selected models simultaneously with designing of bump-less optimal robust controller for nonlinear vibration control of composite plates

The idea of using quasi-linear piecewise models has been established on the decomposition of complicated nonlinear systems, simultaneously designing with local controllers. Since the proper performance and the final system close loop stability are vital in multi-model controllers designing, the main problem in multi-model controllers is the number of the local models and their position not payi...

متن کامل

Improving Chernoff criterion for classification by using the filled function

Linear discriminant analysis is a well-known matrix-based dimensionality reduction method. It is a supervised feature extraction method used in two-class classification problems. However, it is incapable of dealing with data in which classes have unequal covariance matrices. Taking this issue, the Chernoff distance is an appropriate criterion to measure distances between distributions. In the p...

متن کامل

Face recognition using adaptive margin fisher's criterion and linear discriminant analysis (AMFC-LDA)

Selecting a low dimensional feature subspace from thousands of features is a key phenomenon for optimal classification. Linear Discriminant Analysis (LDA) is a basic well recognized supervised classifier that is effectively employed for classification. However, two problems arise in intra class during discriminant analysis. Firstly, in training phase the number of samples in intra class is smal...

متن کامل

A Recursive Information Gene Selection Using Improved Laplacian Maximum Margin Criterion ⋆

Gene selection is an important research topic in pattern recognition and tumor classification. Numerous methods have been proposed, Maximum Margin Criterion (MMC) is one of the famous methods have been proposed to solve the small size samples problem. But, the MMC only considers the global structure of samples. In this article, a novel recursive gene selection criterion named Laplacian Maximum ...

متن کامل

Learning SVM with weighted maximum margin criterion for classification of imbalanced data

As a kernel-based method, whether the selected kernel matches the data determines the performance of support vector machine. Conventional support vector classifiers are not suitable to the imbalanced learning tasks since they tend to classify the instances to the majority class which is the less important class. In this paper, we propose a weighted maximum margin criterion to optimize the data-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009